Bioinformatics Advance Access published June 4 , 2007
ثبت نشده
چکیده
2 network from more comprehensive aspects: variation of gene expression level, gene co-expression between directly connected proteins and the topology of the sub-network, whereas many other existing methods only take a part of these factors into account. By the edge-based searching procedure utilizing connected edges, instead of vertices, what we construct is a sub-network with the topology structure of condition-relevant interactions, making an improvement over the previous vertex-based methods. We applied the proposed method to the human PPI network from HPRD (Peri et al., 2004) using a gene expression dataset of prostate cancer (Lapointe et al., 2004) and the yeast PPI network from DIP (Salwinski et al., 2004) using a gene expression dataset of cell cycle (Spellman et al., 1998). Results demonstrated that our method was of improved efficiency in capturing relevant interaction behaviors under the investigated conditions. For the prostate cancer dataset, by taking prostate cancer related genes (Li et al., 2003) as seeds, we combined the edge-based seed expansion approach (Chen et al., 2006) to explore the network in more detail. The advantage for the seed expansion approach is that it can directly use prior knowledge of known disease proteins. The PPI data was derived from the physical PPI dataset of DIP (2006 release) (Salwinski et al., 2004) and HPRD (Peri et al., 2004) (Release 6). We processed the data as follows: (i) removing self-interactions; (ii) removing reduplicate interactions. The prostate dataset (Lapointe et al., 2004) consists of about 26,000 genes measured in 71 prostate tumors, as well as 41 normal prostate specimens. The expression dataset for Cell Cycle (Spellman et al., 1998) contains the relative expression changes of yeast genes during the cell cycle measured in 77 different time points. For each of the above cDNA microar-ray datasets, we screened out genes with missing data in more than 10% of arrays and applied a base-2 logarithmic transformation(Wang et al., 2006). Then, we carried out data normalization so that the observations had the mean 0 and standard deviation 1 in every array. By integrating the processed PPI and expression data, we constructed the entire network to be searched for the condition-responsive sub-network. Briefly, from a PPI network with proteins as vertices and interactions as edges, we deleted the vertices without gene expression data. Finally, the entire network to be searched contained 6509 vertices with 23157 edges for the prostate cancer dataset, while the entire network contained 3619 verti-ces …